21 research outputs found
On Factors Affecting the Usage and Adoption of a Nation-wide TV Streaming Service
Using nine months of access logs comprising 1.9 Billion sessions to BBC
iPlayer, we survey the UK ISP ecosystem to understand the factors affecting
adoption and usage of a high bandwidth TV streaming application across
different providers. We find evidence that connection speeds are important and
that external events can have a huge impact for live TV usage. Then, through a
temporal analysis of the access logs, we demonstrate that data usage caps
imposed by mobile ISPs significantly affect usage patterns, and look for
solutions. We show that product bundle discounts with a related fixed-line ISP,
a strategy already employed by some mobile providers, can better support user
needs and capture a bigger share of accesses. We observe that users regularly
split their sessions between mobile and fixed-line connections, suggesting a
straightforward strategy for offloading by speculatively pre-fetching content
from a fixed-line ISP before access on mobile devices.Comment: In Proceedings of IEEE INFOCOM 201
SPoT: Representing the Social, Spatial, and Temporal Dimensions of Human Mobility with a Unifying Framework
Modeling human mobility is crucial in the analysis and simulation of opportunistic networks, where contacts are exploited as opportunities for peer-topeer message forwarding. The current approach with human mobility modeling has been based on continuously modifying models, trying to embed in them the mobility properties (e.g., visiting patterns to locations or specific distributions of inter-contact times) as they came up from trace analysis. As
a consequence, with these models it is difficult, if not impossible, to modify the features of mobility or to control the exact shape of mobility metrics (e.g., modifying the distribution of inter-contact times). For these reasons, in this paper we propose a mobility framework rather than a mobility model, with the explicit goal of providing a exible and controllable tool for modeling mathematically and generating simulatively different possible features of human mobility. Our framework, named SPoT, is able to incorporate the three dimensions - spatial, social, and temporal - of human mobility. The way SPoT does it is by mapping the different social communities of the network into different locations, whose members visit with a configurable temporal pattern. In order to characterize the temporal patterns of user visits to locations and the relative positioning of locations based on their shared users, we analyze the traces of real user movements extracted from three location-based online social networks (Gowalla, Foursquare, and Altergeo). We observe that a Bernoulli process effectively approximates user visits to locations in the majority of cases and that locations that share many common users visiting them frequently tend to be located close to each other. In addition, we use these traces to test the exibility of the framework, and we show that SPoT is able to accurately reproduce the mobility behavior observed in traces. Finally, relying on the Bernoulli assumption for arrival processes, we provide a throughout mathematical analysis of the controllability of the framework, deriving the conditions under which heavy-tailed and exponentially-tailed aggregate inter-contact times (often observed in real traces) emerge
Identifying Geographic Clusters: A Network Analytic Approach
In recent years there has been a growing interest in the role of networks and
clusters in the global economy. Despite being a popular research topic in
economics, sociology and urban studies, geographical clustering of human
activity has often studied been by means of predetermined geographical units
such as administrative divisions and metropolitan areas. This approach is
intrinsically time invariant and it does not allow one to differentiate between
different activities. Our goal in this paper is to present a new methodology
for identifying clusters, that can be applied to different empirical settings.
We use a graph approach based on k-shell decomposition to analyze world
biomedical research clusters based on PubMed scientific publications. We
identify research institutions and locate their activities in geographical
clusters. Leading areas of scientific production and their top performing
research institutions are consistently identified at different geographic
scales
ISP-friendly Peer-assisted On-demand Streaming of Long Duration Content in BBC iPlayer
In search of scalable solutions, CDNs are exploring P2P support. However, the
benefits of peer assistance can be limited by various obstacle factors such as
ISP friendliness - requiring peers to be within the same ISP, bitrate
stratification - the need to match peers with others needing similar bitrate,
and partial participation - some peers choosing not to redistribute content.
This work relates potential gains from peer assistance to the average number
of users in a swarm, its capacity, and empirically studies the effects of these
obstacle factors at scale, using a month-long trace of over 2 million users in
London accessing BBC shows online. Results indicate that even when P2P swarms
are localised within ISPs, up to 88% of traffic can be saved. Surprisingly,
bitrate stratification results in 2 large sub-swarms and does not significantly
affect savings. However, partial participation, and the need for a minimum
swarm size do affect gains. We investigate improvements to gain from increasing
content availability through two well-studied techniques: content bundling -
combining multiple items to increase availability, and historical caching of
previously watched items. Bundling proves ineffective as increased server
traffic from larger bundles outweighs benefits of availability, but simple
caching can considerably boost traffic gains from peer assistance.Comment: In Proceedings of IEEE INFOCOM 201
Geo-Spotting: Mining Online Location-based Services for Optimal Retail Store Placement
The problem of identifying the optimal location for a new retail store has
been the focus of past research, especially in the field of land economy, due
to its importance in the success of a business. Traditional approaches to the
problem have factored in demographics, revenue and aggregated human flow
statistics from nearby or remote areas. However, the acquisition of relevant
data is usually expensive. With the growth of location-based social networks,
fine grained data describing user mobility and popularity of places has
recently become attainable.
In this paper we study the predictive power of various machine learning
features on the popularity of retail stores in the city through the use of a
dataset collected from Foursquare in New York. The features we mine are based
on two general signals: geographic, where features are formulated according to
the types and density of nearby places, and user mobility, which includes
transitions between venues or the incoming flow of mobile users from distant
areas. Our evaluation suggests that the best performing features are common
across the three different commercial chains considered in the analysis,
although variations may exist too, as explained by heterogeneities in the way
retail facilities attract users. We also show that performance improves
significantly when combining multiple features in supervised learning
algorithms, suggesting that the retail success of a business may depend on
multiple factors.Comment: Proceedings of the 19th ACM SIGKDD international conference on
Knowledge discovery and data mining, Chicago, 2013, Pages 793-80
Wearing Many (Social) Hats: How Different are Your Different Social Network Personae?
This paper investigates when users create profiles in different social
networks, whether they are redundant expressions of the same persona, or they
are adapted to each platform. Using the personal webpages of 116,998 users on
About.me, we identify and extract matched user profiles on several major social
networks including Facebook, Twitter, LinkedIn, and Instagram. We find evidence
for distinct site-specific norms, such as differences in the language used in
the text of the profile self-description, and the kind of picture used as
profile image. By learning a model that robustly identifies the platform given
a user's profile image (0.657--0.829 AUC) or self-description (0.608--0.847
AUC), we confirm that users do adapt their behaviour to individual platforms in
an identifiable and learnable manner. However, different genders and age groups
adapt their behaviour differently from each other, and these differences are,
in general, consistent across different platforms. We show that differences in
social profile construction correspond to differences in how formal or informal
the platform is.Comment: Accepted at the 11th International AAAI Conference on Web and Social
Media (ICWSM17
Improved Adaptive Algorithm for Scalable Active Learning with Weak Labeler
Active learning with strong and weak labelers considers a practical setting
where we have access to both costly but accurate strong labelers and inaccurate
but cheap predictions provided by weak labelers. We study this problem in the
streaming setting, where decisions must be taken \textit{online}. We design a
novel algorithmic template, Weak Labeler Active Cover (WL-AC), that is able to
robustly leverage the lower quality weak labelers to reduce the query
complexity while retaining the desired level of accuracy. Prior active learning
algorithms with access to weak labelers learn a difference classifier which
predicts where the weak labels differ from strong labelers; this requires the
strong assumption of realizability of the difference classifier (Zhang and
Chaudhuri,2015). WL-AC bypasses this \textit{realizability} assumption and thus
is applicable to many real-world scenarios such as random corrupted weak labels
and high dimensional family of difference classifiers (\textit{e.g.,} deep
neural nets). Moreover, WL-AC cleverly trades off evaluating the quality with
full exploitation of weak labelers, which allows to convert any active learning
strategy to one that can leverage weak labelers. We provide an instantiation of
this template that achieves the optimal query complexity for any given weak
labeler, without knowing its accuracy a-priori. Empirically, we propose an
instantiation of the WL-AC template that can be efficiently implemented for
large-scale models (\textit{e.g}., deep neural nets) and show its effectiveness
on the corrupted-MNIST dataset by significantly reducing the number of labels
while keeping the same accuracy as in passive learning
Modeling and understanding the role of human mobility in the cyber-physical world
Modeling human mobility is important in the context of smart
cities as it can assist design of pervasive systems and intelligent services in the city. In synthetic mobility models dynamic processes in the city are modeled by means of either simulation or mathematical analysis. Traditional synthetic approaches are usually limited by the state of the art findings in human mobility analysis and fail to update when new results come up from trace analysis. Moreover, the understanding of the connection between different mobility characteristics is missing from the existing synthetic models. This implies that there is no direct way to control the output of the models (e.g., statistics of contacts between people) using the input parameters (e.g., human mobility patterns). In this
work we propose a mobility framework that can be instantiated to the required mobility settings and produce controllable output. The framework is built around the three dimensions of human movements, namely, social, spatial and temporal. The social environment in the framework is customized by taking the social graph as input. Then the spatial dimension is added by distributing communities of tightly connected users across common meeting places and assigning them to physical locations. The temporal dimension of human arrivals to places is modeled with stochastic point processes. We demonstrate the flexibility of the framework by showing that it can reproduce realistic mobility behavior observed in the mobility traces collected from online locationbased
social networks. Additionally, we show that the framework
can produce controllable output by providing a thorough
mathematical analysis of the contact statistics in different mobility settings. Alternatively, data-driven models are used when the system under analysis is not well formalized but its behavior can be traced and further studied from the traces. In data-driven models relations between properties of the system and patterns of human movements are mined directly from the data with the help of machine-learning. In the second part of this work we develop a data-driven methodology to study the impact of human mobility on the retail quality of locations in the city. With respect to existing work in this direction
we aim to assess the extent to which the new layers of information available in location-based social networks can assist geographic retail analysis. We study co-location patterns of various venues in the city and propose a methodology to assess the flows of the users between them. We exploit the result of this analysis to tackle the optimal business placement problem for three different retail chains in New York. We formalize this problem as a data-mining task where we aim to predict potential popularity of a store if placed in a given area. We devise a number of signals to describe the area including place-geographic features, e.g, density, heterogeneity of places, and mobility-based features, e.g., flows of users towards and inside the area. We show that the presence of place-attractors (e.g., airport, train station) and competing
venues in the area are strong indicators of the popularity
across all considered chains. However, the best performance
is achieved when we consider the fusion of mobility
and place-geographic features